library(topGO)
library(knitr) # to use kable()
library(limma) # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')
This is the enrichment analysis for isoforms regulated by hatching. Because I quantified the association of expression with hatching in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with isoforms ordered by the proportion of expression variance explained by hatching. Note that some isoforms with low variance may still be highly associated with hatching, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of isoforms based on the significance of the differential expression between levels of hatching, which does depend on fold change.
Functional annotation is in 2019-07-26. I will also upload two lists of isoforms, with either proportion of variance explained by hatching or p-value of differential expression test.
tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))
To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the isoforms identifiers and the numeric values, either the portion of variance explained by hatching or the p-values of the differential expression test. The structure() function adds attributes to an object.
Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues <- structure(tagPValue[,1], names = row.names(tagPValue))
rm(tagVariance, tagPValue)
There are 33037 isoforms scored with a variance portion and a differential expression p-value. It should actually be the exact same isoforms. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.
head(annotation)
## tagname
## 1 TCONS_00000002
## 2 TCONS_00000010
## 3 TCONS_00000011
## 4 TCONS_00000016
## 5 TCONS_00000017
## 6 TCONS_00000018
## goterms
## 1 GO:0008417|GO:0016020|GO:0006486
## 2 GO:0043130|GO:0005515|GO:0043161
## 3 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 6 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)
There are 19735 isoforms with GO annotations. But the differential expression analysis includes many more isoforms. In order to include the not-annotated isoforms in the enrichment analysis, to see if annotated or not annotated isoforms are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.
for (tag in unique(c(names(PValues), names(Variance)))) {
if (! tag %in% names(allgenes2GO)) {
allgenes2GO <- append(allgenes2GO,
structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
}
}
Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.
I generate three datasets, to analyse each of the three ontologies.
selection <- function(allScores) {return(allScores < 0.01)}
GOdata.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 26235 | 1582 |
| MF | 30580 | 791 |
| CC | 21616 | 390 |
There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.
BP.elim <- runTest(GOdata.BP, algorithm = "elim", statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea <- runTest(GOdata.BP, algorithm = "lea", statistic = "ks")
MF.elim <- runTest(GOdata.MF, algorithm = "elim", statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea <- runTest(GOdata.MF, algorithm = "lea", statistic = "ks")
CC.elim <- runTest(GOdata.CC, algorithm = "elim", statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea <- runTest(GOdata.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1582 | 24 |
| BP | weight01 | 1582 | 10 |
| BP | lea | 1582 | 37 |
| MF | elim | 791 | 10 |
| MF | weight01 | 791 | 13 |
| MF | lea | 791 | 13 |
| CC | elim | 390 | 2 |
| CC | weight01 | 390 | 1 |
| CC | lea | 390 | 3 |
rm(ResultsSummary)
The topGO package only looks for GO terms that are overrepresented among the top of the gene list (genes with lowest score, assumed to be a \(p\) value; see 2020-06-30). Thus, underrepresented GO terms are ignored, even though it could be interesting to know if among differentially expressed genes there are fewer genes annotated with certain terms than expected by chance. The way to identify those terms would be to reverse the scores. In this case, running the analysis with \(1 - p\) values would be to search for terms that are overrepresented among non-differentially expressed genes. Not worth pursuing now.
Note that, despite my initial confusion (see previous commits), it is not true that significant terms with fewer significant genes than expected are significantly underrepresented terms. It is just possible that a term is significant because of the overall distribution of scores of genes annotated with that term, even if none of those genes (or just fewer than expected) are significant. This is because of the use of Kolmogorov-Smirnov test, instead of the Fisher’s exact test. Note that even though no hard threshold is necessary in the K-S test, GenTable() produces a summary table with number of annotated and significant genes for every significant GO term. That number of significant term is determined by the selection() function passed to the topGO object, only for display purposes unless using Fisher’s exact test.
orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms
BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
write.table(BP.all, file=paste(TAG, VAR, 'BPsummary.tsv', sep='/'),
quote=FALSE, sep='\t', row.names=FALSE)
kable(BP.all,
caption = "Most over-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0001522 | pseudouridine synthesis | 31 | 2 | 0.30 | 2 | 0.0014 | 0.0014 | 0.00143 |
| GO:0046854 | phosphatidylinositol phosphorylation | 45 | 0 | 0.44 | 4 | 0.0027 | 0.0027 | 0.00272 |
| GO:0006400 | tRNA modification | 58 | 1 | 0.57 | 45 | 0.0163 | 0.0029 | 0.01633 |
| GO:0050684 | regulation of mRNA processing | 15 | 1 | 0.15 | 12 | 0.0059 | 0.0038 | 0.00592 |
| GO:0030513 | positive regulation of BMP signaling pat… | 9 | 0 | 0.09 | 8 | 0.0052 | 0.0052 | 0.00525 |
| GO:0060271 | cilium assembly | 73 | 1 | 0.72 | 7 | 0.0049 | 0.0068 | 0.00038 |
| GO:0016579 | protein deubiquitination | 90 | 2 | 0.88 | 14 | 0.0070 | 0.0070 | 0.00696 |
| GO:0048015 | phosphatidylinositol-mediated signaling | 40 | 0 | 0.39 | 20 | 0.0081 | 0.0081 | 0.00806 |
| GO:0070286 | axonemal dynein complex assembly | 9 | 0 | 0.09 | 21 | 0.0082 | 0.0082 | 0.00823 |
| GO:0055072 | iron ion homeostasis | 21 | 0 | 0.21 | 622 | 0.4004 | 0.0094 | 0.40036 |
| GO:0007411 | axon guidance | 11 | 0 | 0.11 | 27 | 0.0118 | 0.0118 | 0.01182 |
| GO:0006418 | tRNA aminoacylation for protein translat… | 87 | 1 | 0.85 | 78 | 0.0333 | 0.0124 | 0.03332 |
| GO:0015937 | coenzyme A biosynthetic process | 8 | 0 | 0.08 | 41 | 0.0147 | 0.0147 | 0.01467 |
| GO:0007275 | multicellular organism development | 85 | 0 | 0.83 | 108 | 0.0469 | 0.0155 | 0.00208 |
| GO:0046434 | organophosphate catabolic process | 52 | 0 | 0.51 | 893 | 0.5923 | 0.0158 | 0.59232 |
| GO:0060828 | regulation of canonical Wnt signaling pa… | 5 | 1 | 0.05 | 46 | 0.0168 | 0.0168 | 0.01683 |
| GO:0050953 | sensory perception of light stimulus | 11 | 0 | 0.11 | 846 | 0.5527 | 0.0172 | 0.55272 |
| GO:0006511 | ubiquitin-dependent protein catabolic pr… | 204 | 2 | 2.00 | 325 | 0.2086 | 0.0178 | 0.20864 |
| GO:0032543 | mitochondrial translation | 5 | 0 | 0.05 | 52 | 0.0186 | 0.0186 | 0.01859 |
| GO:0048017 | inositol lipid-mediated signaling | 50 | 0 | 0.49 | 53 | 0.0188 | 0.0188 | 0.00241 |
| GO:0006488 | dolichol-linked oligosaccharide biosynth… | 11 | 0 | 0.11 | 54 | 0.0189 | 0.0189 | 0.01886 |
| GO:0006654 | phosphatidic acid biosynthetic process | 10 | 0 | 0.10 | 55 | 0.0189 | 0.0189 | 0.01887 |
| GO:0007040 | lysosome organization | 8 | 0 | 0.08 | 58 | 0.0193 | 0.0193 | 0.01933 |
| GO:0006030 | chitin metabolic process | 103 | 0 | 1.01 | 142 | 0.0681 | 0.0222 | 0.06805 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with',
VAR, 'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0001522 | pseudouridine synthesis | The intramolecular conversion of uridine to pseudouridine within an RNA molecule. This posttranscriptional base modification occurs in tRNA, rRNA, and snRNAs. | 0.0014295 |
| GO:0046854 | phosphatidylinositol phosphorylation | The process of introducing one or more phosphate groups into a phosphatidylinositol, any glycerophosphoinositol having one phosphatidyl group esterified to one of the hydroxy groups of inositol. | 0.0027221 |
| GO:0050684 | regulation of mRNA processing | Any process that modulates the frequency, rate or extent of mRNA processing, those processes involved in the conversion of a primary mRNA transcript into a mature mRNA prior to its translation into polypeptide. | 0.0038049 |
| GO:0030513 | positive regulation of BMP signaling pathway | Any process that activates or increases the frequency, rate or extent of BMP signaling pathway activity. | 0.0052479 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0067752 |
| GO:0016579 | protein deubiquitination | The removal of one or more ubiquitin groups from a protein. | 0.0069571 |
| GO:0048015 | phosphatidylinositol-mediated signaling | A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives. | 0.0080632 |
| GO:0070286 | axonemal dynein complex assembly | The aggregation, arrangement and bonding together of a set of components to form an axonemal dynein complex, a dynein complex found in eukaryotic cilia and flagella, in which the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. | 0.0082319 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOdata.BP, score(BP.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 276
## Number of Edges = 533
##
## $complete.dag
## [1] "A graph with 276 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms
MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
write.table(MF.all, file=paste(TAG, VAR, 'MFsummary.tsv', sep='/'),
quote=FALSE, sep='\t', row.names=FALSE)
kable(MF.all,
caption = "Most over-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0016747 | transferase activity, transferring acyl … | 158 | 3 | 1.52 | 149 | 0.18300 | 0.00036 | 0.20455 |
| GO:0140101 | catalytic activity, acting on a tRNA | 139 | 1 | 1.34 | 2 | 0.00096 | 0.00041 | 0.00096 |
| GO:0035091 | phosphatidylinositol binding | 109 | 3 | 1.05 | 1 | 0.00025 | 0.00045 | 0.00025 |
| GO:0004812 | aminoacyl-tRNA ligase activity | 90 | 1 | 0.87 | 17 | 0.01613 | 0.00098 | 0.01613 |
| GO:0005509 | calcium ion binding | 638 | 6 | 6.13 | 3 | 0.00156 | 0.00156 | 0.00156 |
| GO:0004594 | pantothenate kinase activity | 5 | 0 | 0.05 | 4 | 0.00283 | 0.00283 | 0.00283 |
| GO:0009982 | pseudouridine synthase activity | 27 | 0 | 0.26 | 5 | 0.00300 | 0.00300 | 0.00300 |
| GO:0008417 | fucosyltransferase activity | 42 | 0 | 0.40 | 6 | 0.00422 | 0.00422 | 0.00422 |
| GO:0060090 | molecular adaptor activity | 45 | 2 | 0.43 | 16 | 0.01539 | 0.00449 | 0.01539 |
| GO:0005515 | protein binding | 4270 | 46 | 41.05 | 28 | 0.02495 | 0.00528 | 0.01108 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.weight01)[sigTerms]),
caption = paste('Molecular function terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0140101 | catalytic activity, acting on a tRNA | NA | 0.0004062 |
| GO:0035091 | phosphatidylinositol binding | Interacting selectively and non-covalently with any inositol-containing glycerophospholipid, i.e. phosphatidylinositol (PtdIns) and its phosphorylated derivatives. | 0.0004529 |
| GO:0005509 | calcium ion binding | Interacting selectively and non-covalently with calcium ions (Ca2+). | 0.0015580 |
| GO:0004594 | pantothenate kinase activity | Catalysis of the reaction: ATP + pantothenate = ADP + D-4’-phosphopantothenate. | 0.0028317 |
| GO:0009982 | pseudouridine synthase activity | Catalysis of the reaction: RNA uridine = RNA pseudouridine. Conversion of uridine in an RNA molecule to pseudouridine by rotation of the C1’-N-1 glycosidic bond of uridine in RNA to a C1’-C5. | 0.0030029 |
| GO:0008417 | fucosyltransferase activity | Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0042250 |
| GO:0004630 | phospholipase D activity | Catalysis of the reaction: a phosphatidylcholine + H2O = choline + a phosphatidate. | 0.0059711 |
showSigOfNodes(GOdata.MF, score(MF.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 38
## Number of Edges = 41
##
## $complete.dag
## [1] "A graph with 38 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms
CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
write.table(CC.all, file=paste(TAG, VAR, 'CCsummary.tsv', sep='/'),
quote=FALSE, sep='\t', row.names=FALSE)
kable(CC.all,
caption = "Most over-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:1902555 | endoribonuclease complex | 10 | 0 | 0.09 | 10 | 0.020 | 0.004 | 0.020 |
| GO:0005576 | extracellular region | 197 | 2 | 1.72 | 15 | 0.026 | 0.014 | 0.026 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue |
|---|
showSigOfNodes(GOdata.CC, score(CC.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 6
## Number of Edges = 5
##
## $complete.dag
## [1] "A graph with 6 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)
I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by hatching. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to \(p\)-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by hatching to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by hatching.
selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
| ontology | Num_Genes | Num_GO_terms |
|---|---|---|
| BP | 26235 | 1582 |
| MF | 30580 | 791 |
| CC | 21616 | 390 |
BPvar.elim <- runTest(GOdataVar.BP, algorithm = "elim", statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea <- runTest(GOdataVar.BP, algorithm = "lea", statistic = "ks")
MFvar.elim <- runTest(GOdataVar.MF, algorithm = "elim", statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea <- runTest(GOdataVar.MF, algorithm = "lea", statistic = "ks")
CCvar.elim <- runTest(GOdataVar.CC, algorithm = "elim", statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea <- runTest(GOdataVar.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
| ontology | algorithm | TermsTested | Significant |
|---|---|---|---|
| BP | elim | 1582 | 33 |
| BP | weight01 | 1582 | 18 |
| BP | lea | 1582 | 66 |
| MF | elim | 791 | 23 |
| MF | weight01 | 791 | 20 |
| MF | lea | 791 | 29 |
| CC | elim | 390 | 10 |
| CC | weight01 | 390 | 3 |
| CC | lea | 390 | 18 |
rm(ResultsSummary)
orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms
BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(BPvar.all,
caption = "Most over-represented terms of the Biological Process ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0006511 | ubiquitin-dependent protein catabolic pr… | 204 | 33 | 20.21 | 9 | 0.00217 | 2.2e-05 | 0.00217 |
| GO:0030513 | positive regulation of BMP signaling pat… | 9 | 3 | 0.89 | 1 | 3.6e-05 | 3.6e-05 | 3.6e-05 |
| GO:0016579 | protein deubiquitination | 90 | 19 | 8.92 | 2 | 8.2e-05 | 8.2e-05 | 8.2e-05 |
| GO:0055072 | iron ion homeostasis | 21 | 2 | 2.08 | 514 | 0.24052 | 0.00011 | 0.24052 |
| GO:0007411 | axon guidance | 11 | 3 | 1.09 | 5 | 0.00074 | 0.00074 | 0.00074 |
| GO:0007264 | small GTPase mediated signal transductio… | 228 | 31 | 22.59 | 10 | 0.00297 | 0.00123 | 0.00297 |
| GO:0046434 | organophosphate catabolic process | 52 | 6 | 5.15 | 362 | 0.15812 | 0.00147 | 0.15812 |
| GO:0032543 | mitochondrial translation | 5 | 1 | 0.50 | 8 | 0.00192 | 0.00192 | 0.00192 |
| GO:0006030 | chitin metabolic process | 103 | 10 | 10.20 | 21 | 0.00606 | 0.00202 | 0.00606 |
| GO:0006486 | protein glycosylation | 106 | 10 | 10.50 | 6 | 0.00083 | 0.00292 | 0.00083 |
| GO:0001522 | pseudouridine synthesis | 31 | 9 | 3.07 | 13 | 0.00328 | 0.00328 | 0.00328 |
| GO:0006400 | tRNA modification | 58 | 12 | 5.75 | 59 | 0.01725 | 0.00335 | 0.01725 |
| GO:0006997 | nucleus organization | 5 | 3 | 0.50 | 14 | 0.00342 | 0.00342 | 0.00342 |
| GO:0035307 | positive regulation of protein dephospho… | 5 | 2 | 0.50 | 18 | 0.00516 | 0.00516 | 0.00516 |
| GO:0060271 | cilium assembly | 73 | 6 | 7.23 | 3 | 0.00039 | 0.00532 | 0.00039 |
| GO:0046907 | intracellular transport | 422 | 60 | 41.81 | 139 | 0.04443 | 0.00660 | 0.00088 |
| GO:0015937 | coenzyme A biosynthetic process | 8 | 3 | 0.79 | 24 | 0.00668 | 0.00668 | 0.00668 |
| GO:0034474 | U2 snRNA 3’-end processing | 17 | 4 | 1.68 | 30 | 0.00915 | 0.00915 | 0.00915 |
| GO:0006545 | glycine biosynthetic process | 7 | 0 | 0.69 | 34 | 0.01016 | 0.01016 | 0.01016 |
| GO:0006397 | mRNA processing | 158 | 35 | 15.65 | 12 | 0.00322 | 0.01045 | 0.00322 |
| GO:0010972 | negative regulation of G2/M transition o… | 5 | 0 | 0.50 | 37 | 0.01103 | 0.01103 | 0.01103 |
| GO:0030163 | protein catabolic process | 241 | 35 | 23.87 | 7 | 0.00137 | 0.01230 | 5.1e-05 |
| GO:1903047 | mitotic cell cycle process | 86 | 8 | 8.52 | 631 | 0.32396 | 0.01245 | 0.65910 |
| GO:0050953 | sensory perception of light stimulus | 11 | 2 | 1.09 | 824 | 0.46346 | 0.01444 | 0.46346 |
| GO:0007029 | endoplasmic reticulum organization | 13 | 2 | 1.29 | 49 | 0.01522 | 0.01522 | 0.01522 |
| GO:0038007 | netrin-activated signaling pathway | 5 | 1 | 0.50 | 50 | 0.01527 | 0.01527 | 0.01527 |
| GO:0034314 | Arp2/3 complex-mediated actin nucleation | 17 | 4 | 1.68 | 52 | 0.01539 | 0.01539 | 0.01539 |
| GO:0035195 | gene silencing by miRNA | 5 | 1 | 0.50 | 53 | 0.01546 | 0.01546 | 0.01546 |
| GO:0070286 | axonemal dynein complex assembly | 9 | 2 | 0.89 | 56 | 0.01634 | 0.01634 | 0.01634 |
| GO:0046854 | phosphatidylinositol phosphorylation | 45 | 11 | 4.46 | 63 | 0.01918 | 0.01918 | 0.01918 |
| GO:0006904 | vesicle docking involved in exocytosis | 22 | 6 | 2.18 | 67 | 0.01959 | 0.01959 | 0.01959 |
| GO:0006418 | tRNA aminoacylation for protein translat… | 87 | 7 | 8.62 | 60 | 0.01839 | 0.01962 | 0.01839 |
| GO:0051270 | regulation of cellular component movemen… | 5 | 1 | 0.50 | 68 | 0.01975 | 0.01975 | 0.01975 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(BPvar.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0006511 | ubiquitin-dependent protein catabolic process | The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of a ubiquitin group, or multiple ubiquitin groups, to the protein. | 0.0000222 |
| GO:0030513 | positive regulation of BMP signaling pathway | Any process that activates or increases the frequency, rate or extent of BMP signaling pathway activity. | 0.0000357 |
| GO:0016579 | protein deubiquitination | The removal of one or more ubiquitin groups from a protein. | 0.0000819 |
| GO:0007411 | axon guidance | The chemotaxis process that directs the migration of an axon growth cone to a specific target site in response to a combination of attractive and repulsive cues. | 0.0007397 |
| GO:0007264 | small GTPase mediated signal transduction | Any series of molecular signals in which a small monomeric GTPase relays one or more of the signals. | 0.0012329 |
| GO:0032543 | mitochondrial translation | The chemical reactions and pathways resulting in the formation of a protein in a mitochondrion. This is a ribosome-mediated process in which the information in messenger RNA (mRNA) is used to specify the sequence of amino acids in the protein; the mitochondrion has its own ribosomes and transfer RNAs, and uses a genetic code that differs from the nuclear code. | 0.0019207 |
| GO:0006030 | chitin metabolic process | The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0020236 |
| GO:0006486 | protein glycosylation | A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. | 0.0029192 |
| GO:0001522 | pseudouridine synthesis | The intramolecular conversion of uridine to pseudouridine within an RNA molecule. This posttranscriptional base modification occurs in tRNA, rRNA, and snRNAs. | 0.0032848 |
| GO:0006997 | nucleus organization | A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of the nucleus. | 0.0034201 |
| GO:0035307 | positive regulation of protein dephosphorylation | Any process that activates or increases the frequency, rate or extent of removal of phosphate groups from a protein. | 0.0051619 |
| GO:0060271 | cilium assembly | The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. | 0.0053173 |
| GO:0015937 | coenzyme A biosynthetic process | The chemical reactions and pathways resulting in the formation of coenzyme A, 3’-phosphoadenosine-(5’)diphospho(4’)pantatheine, an acyl carrier in many acylation and acyl-transfer reactions in which the intermediate is a thiol ester. | 0.0066754 |
| GO:0034474 | U2 snRNA 3’-end processing | Any process involved in forming the mature 3’ end of a U2 snRNA molecule. | 0.0091518 |
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 381
## Number of Edges = 772
##
## $complete.dag
## [1] "A graph with 381 nodes."
Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.BP, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.BP, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.BP, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms
MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(MFvar.all,
caption = "Most over-represented terms of the Molecular Function ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0005515 | protein binding | 4270 | 487 | 418.76 | 4 | 0.00056 | 5.9e-05 | 0.00056 |
| GO:0036459 | thiol-dependent ubiquitinyl hydrolase ac… | 103 | 23 | 10.10 | 1 | 3e-05 | 6.7e-05 | 3e-05 |
| GO:0038023 | signaling receptor activity | 471 | 34 | 46.19 | 325 | 0.41565 | 0.00025 | 0.48567 |
| GO:0035091 | phosphatidylinositol binding | 109 | 23 | 10.69 | 2 | 0.00033 | 0.00055 | 0.00033 |
| GO:0008061 | chitin binding | 112 | 10 | 10.98 | 5 | 0.00061 | 0.00061 | 0.00061 |
| GO:0004298 | threonine-type endopeptidase activity | 31 | 4 | 3.04 | 6 | 0.00087 | 0.00087 | 0.00087 |
| GO:0140101 | catalytic activity, acting on a tRNA | 139 | 16 | 13.63 | 10 | 0.00161 | 0.00118 | 0.00161 |
| GO:0008417 | fucosyltransferase activity | 42 | 4 | 4.12 | 8 | 0.00136 | 0.00136 | 0.00136 |
| GO:0004594 | pantothenate kinase activity | 5 | 2 | 0.49 | 9 | 0.00142 | 0.00142 | 0.00142 |
| GO:0003676 | nucleic acid binding | 2125 | 239 | 208.40 | 3 | 0.00052 | 0.00238 | 0.00052 |
| GO:0016417 | S-acyltransferase activity | 6 | 4 | 0.59 | 11 | 0.00247 | 0.00247 | 0.00247 |
| GO:0016742 | hydroxymethyl-, formyl- and related tran… | 9 | 3 | 0.88 | 13 | 0.00276 | 0.00276 | 0.00276 |
| GO:0016887 | ATPase activity | 327 | 43 | 32.07 | 212 | 0.24838 | 0.00715 | 0.24838 |
| GO:0004812 | aminoacyl-tRNA ligase activity | 90 | 7 | 8.83 | 42 | 0.03055 | 0.00828 | 0.03055 |
| GO:0060090 | molecular adaptor activity | 45 | 13 | 4.41 | 7 | 0.00121 | 0.00840 | 0.00121 |
| GO:0005220 | inositol 1,4,5-trisphosphate-sensitive c… | 15 | 2 | 1.47 | 18 | 0.00879 | 0.00879 | 0.00879 |
| GO:0070679 | inositol 1,4,5 trisphosphate binding | 15 | 2 | 1.47 | 19 | 0.00879 | 0.00879 | 0.00879 |
| GO:0005516 | calmodulin binding | 52 | 10 | 5.10 | 20 | 0.00948 | 0.00948 | 0.00948 |
| GO:0003993 | acid phosphatase activity | 10 | 1 | 0.98 | 22 | 0.00974 | 0.00974 | 0.00974 |
| GO:0004312 | fatty acid synthase activity | 6 | 3 | 0.59 | 23 | 0.00998 | 0.00998 | 0.00998 |
| GO:0046527 | glucosyltransferase activity | 17 | 4 | 1.67 | 16 | 0.00693 | 0.01026 | 0.00693 |
| GO:0005102 | signaling receptor binding | 47 | 7 | 4.61 | 112 | 0.13378 | 0.01048 | 0.13378 |
| GO:0016748 | succinyltransferase activity | 5 | 3 | 0.49 | 24 | 0.01065 | 0.01065 | 0.01065 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MFvar.weight01)[sigTerms]),
caption = paste('Molecular functions terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0005515 | protein binding | Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). | 0.0000586 |
| GO:0036459 | thiol-dependent ubiquitinyl hydrolase activity | Catalysis of the thiol-dependent hydrolysis of an ester, thioester, amide, peptide or isopeptide bond formed by the C-terminal glycine of ubiquitin. | 0.0000670 |
| GO:0035091 | phosphatidylinositol binding | Interacting selectively and non-covalently with any inositol-containing glycerophospholipid, i.e. phosphatidylinositol (PtdIns) and its phosphorylated derivatives. | 0.0005496 |
| GO:0008061 | chitin binding | Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. | 0.0006105 |
| GO:0004298 | threonine-type endopeptidase activity | Catalysis of the hydrolysis of internal peptide bonds in a polypeptide chain by a mechanism in which the hydroxyl group of a threonine residue at the active center acts as a nucleophile. | 0.0008677 |
| GO:0140101 | catalytic activity, acting on a tRNA | NA | 0.0011815 |
| GO:0008417 | fucosyltransferase activity | Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. | 0.0013571 |
| GO:0004594 | pantothenate kinase activity | Catalysis of the reaction: ATP + pantothenate = ADP + D-4’-phosphopantothenate. | 0.0014241 |
| GO:0003676 | nucleic acid binding | Interacting selectively and non-covalently with any nucleic acid. | 0.0023808 |
| GO:0016417 | S-acyltransferase activity | Catalysis of the transfer of an acyl group to a sulfur atom on the acceptor molecule. | 0.0024717 |
| GO:0016742 | hydroxymethyl-, formyl- and related transferase activity | Catalysis of the transfer of a hydroxymethyl- or formyl group from one compound (donor) to another (acceptor). | 0.0027562 |
| GO:0060090 | molecular adaptor activity | The binding activity of a molecule that brings together two or more molecules through a selective, non-covalent, often stoichiometric interaction, permitting those molecules to function in a coordinated way. | 0.0083968 |
| GO:0005220 | inositol 1,4,5-trisphosphate-sensitive calcium-release channel activity | Enables the transmembrane transfer of a calcium ion by a channel that opens when inositol 1,4,5-trisphosphate (IP3) has been bound by the channel complex or one of its constituent parts. | 0.0087864 |
| GO:0070679 | inositol 1,4,5 trisphosphate binding | Interacting selectively and non-covalently with inositol 1,4,5 trisphosphate. | 0.0087864 |
| GO:0005516 | calmodulin binding | Interacting selectively and non-covalently with calmodulin, a calcium-binding protein with many roles, both in the calcium-bound and calcium-free states. | 0.0094780 |
| GO:0003993 | acid phosphatase activity | Catalysis of the reaction: an orthophosphoric monoester + H2O = an alcohol + phosphate, with an acid pH optimum. | 0.0097424 |
| GO:0004312 | fatty acid synthase activity | Catalysis of the reaction: acetyl-CoA + n malonyl-CoA + 2n NADPH + 2n H+ = long-chain fatty acid + n+1 CoA + n CO2 + 2n NADP+. | 0.0099792 |
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 87
## Number of Edges = 107
##
## $complete.dag
## [1] "A graph with 87 nodes."
I plot variance portion, but for the term found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.MF, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.MF, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.MF, orderedTerms[3], rm.one=FALSE)
orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms
CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))
kable(CCvar.all,
caption = "Most over-represented terms of the Cellular Component ontology.")
| GO.ID | Term | Annotated | Significant | Expected | Rank in elim | elim | weight01 | lea |
|---|---|---|---|---|---|---|---|---|
| GO:0016272 | prefoldin complex | 7 | 0 | 0.66 | 6 | 0.0053 | 0.0053 | 0.00531 |
| GO:0005737 | cytoplasm | 1176 | 121 | 110.82 | 22 | 0.0242 | 0.0073 | 0.00071 |
| GO:0045239 | tricarboxylic acid cycle enzyme complex | 5 | 3 | 0.47 | 10 | 0.0100 | 0.0100 | 0.00996 |
| GO:0005815 | microtubule organizing center | 96 | 15 | 9.05 | 28 | 0.0286 | 0.0105 | 0.15049 |
| GO:1902555 | endoribonuclease complex | 10 | 3 | 0.94 | 46 | 0.0602 | 0.0105 | 0.06022 |
| GO:0005761 | mitochondrial ribosome | 18 | 4 | 1.70 | 3 | 0.0035 | 0.0111 | 0.00347 |
| GO:0005576 | extracellular region | 197 | 21 | 18.56 | 27 | 0.0282 | 0.0130 | 0.02818 |
| GO:0030015 | CCR4-NOT core complex | 15 | 3 | 1.41 | 13 | 0.0153 | 0.0153 | 0.01528 |
| GO:0005802 | trans-Golgi network | 12 | 1 | 1.13 | 14 | 0.0158 | 0.0158 | 0.01577 |
| GO:0031981 | nuclear lumen | 280 | 37 | 26.39 | 24 | 0.0253 | 0.0162 | 0.02534 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))
kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CCvar.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
| Term | Definition | PValue | |
|---|---|---|---|
| GO:0016272 | prefoldin complex | A multisubunit chaperone that is capable of delivering unfolded proteins to cytosolic chaperonin, which it acts as a cofactor for. In humans, the complex is a heterohexamer of two PFD-alpha and four PFD-beta type subunits. In Saccharomyces cerevisiae, it also acts in the nucleus to regulate the rate of elongation by RNA polymerase II via a direct effect on histone dynamics. | 0.0053057 |
| GO:0045239 | tricarboxylic acid cycle enzyme complex | Any of the heteromeric enzymes that act in the TCA cycle. | 0.0099588 |
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
firstSigNodes = sum(significant.weight01),
wantedNodes = sigTerms)
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 10
## Number of Edges = 13
##
## $complete.dag
## [1] "A graph with 10 nodes."
For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.
showGroupDensity(GOdataVar.CC, orderedTerms[1], rm.one=FALSE)
showGroupDensity(GOdataVar.CC, orderedTerms[2], rm.one=FALSE)
showGroupDensity(GOdataVar.CC, orderedTerms[3], rm.one=FALSE)
allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% BP.pvalue.sigTerms,
Variance = allTerms %in% BP.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(BP.weight01))[allTerms],
Variance = rank(score(BPvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of BP terms by significance')
allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% MF.pvalue.sigTerms,
Variance = allTerms %in% MF.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(MF.weight01))[allTerms],
Variance = rank(score(MFvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(methdo='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of MF terms by significance')
## Warning: Ignoring unknown parameters: methdo
allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% CC.pvalue.sigTerms,
Variance = allTerms %in% CC.variance.sigTerms)))
ggplot(data = data.frame(PValue = rank(score(CC.weight01))[allTerms],
Variance = rank(score(CCvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of CC terms by significance')
I save the main variables to be able to load them in a new R session later.
save(allgenes2GO,
GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=ca_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=ca_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=ca_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.30.0 ggplot2_3.3.0 limma_3.42.2
## [4] knitr_1.28 topGO_2.38.1 SparseM_1.78
## [7] GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2
## [10] S4Vectors_0.24.3 Biobase_2.46.0 graph_1.64.0
## [13] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.0.0 xfun_0.12 purrr_0.3.3 splines_3.6.3
## [5] lattice_0.20-41 colorspace_1.4-1 vctrs_0.2.4 htmltools_0.4.0
## [9] yaml_2.2.1 mgcv_1.8-33 blob_1.2.1 rlang_0.4.5
## [13] pillar_1.4.3 glue_1.4.0 withr_2.1.2 DBI_1.1.0
## [17] bit64_0.9-7 matrixStats_0.55.0 lifecycle_0.2.0 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 memoise_1.1.0 evaluate_0.14
## [25] labeling_0.3 fansi_0.4.1 highr_0.8 Rcpp_1.0.4
## [29] scales_1.1.0 farver_2.0.3 bit_1.1-15.2 digest_0.6.25
## [33] stringi_1.4.6 dplyr_0.8.5 cli_2.0.2 tools_3.6.3
## [37] magrittr_1.5 RSQLite_2.2.0 tibble_3.0.0 crayon_1.3.4
## [41] pkgconfig_2.0.3 Matrix_1.2-18 ellipsis_0.3.0 assertthat_0.2.1
## [45] rmarkdown_2.1 R6_2.4.1 nlme_3.1-149 compiler_3.6.3